Bilingual Example Segmentation based on Markers Hypothesis
نویسندگان
چکیده
The Marker Hypothesis was first defined by Thomas Green in 1979. It is a psycho-linguistic hypothesis defining that there is a set of words in every language that marks boundaries of phrases in a sentence. While it remains a hypothesis because nobody has proved it, tests have shows that results are comparable to basic shallow parsers with higher efficiency. The chunking algorithm based on the Marker Hypothesis is simple, fast and almost language independent. It depends on a list of closed-class words, that are already available for most languages. This makes it suitable for bilingual chunking (there is not the requirement for separate language shallow parsers). This paper discusses the use of the Marker Hypothesis combined with Probabilistic Translation Dictionaries for examplebased machine translation resources extraction from parallel corpora.
منابع مشابه
A Structural-Based Approach to Cantonese-English Machine Translation
In this paper, we present an integrated method to machine translation from Cantonese to English text. Our method combines example-based and rule-based methods that rely solely on example translations kept in a small Example Base (EB). One of the bottlenecks in example-based Machine Translation (MT) is a lack of knowledge or redundant knowledge in its bilingual knowledge base. In our method, a f...
متن کاملAn Investigation of the Effect of Bilingual Education on Language Achievement of Iranian Pre-intermediate EFL Learners
The present study investigated the impact of bilingual education on language achievement of Iranian Pre-intermediate EFL learners. It actually used bilingual education through content- based methodology or subject matter such as math, science and reading. To this purpose, the researchers used 40 Pre-intermediate EFL participants who were studying English conversation at a private language insti...
متن کاملAn Investigation of the Effect of Bilingual Education on Language Achievement of Iranian Pre-intermediate EFL Learners
The present study investigated the impact of bilingual education on language achievement of Iranian Pre-intermediate EFL learners. It actually used bilingual education through content- based methodology or subject matter such as math, science and reading. To this purpose, the researchers used 40 Pre-intermediate EFL participants who were studying English conversation at a private language insti...
متن کاملPre-processing of Bilingual Corpora for Mandarin-English EBMT
Pre-processing of bilingual corpora plays an important role in Example-Based Machine Translation (EBMT) and Statistical-Based Machine Translation (SBMT). For our Mandarin-English EBMT system, pre-processing includes segmentation for Mandarin, bracketing for English and building a statistical dictionary from the corpora. We used the Mandarin segmenter from the Linguistic Data Consortium (LDC). I...
متن کاملExample-based Segmentation of Swedish Compounds in a Swedish–English bilingual corpus and the possibility of Evaluating Compound Links based on that Segmentation
In this paper an algorithm for segmenting Swedish compounds in a linking material is presented. The algorithm does the segmentation by looking at the example set by the corresponding English compound. The idea that this kind of segmentation can be used to evaluate the link between the two compounds is also tested. This would be possible because links where the algorithm cannot find suitable Swe...
متن کامل